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A  major  limiTation  of  current  advisory  systems  (e.g.,  intelligent  tutoring  systems 

and  export  systems)  is  their  restricted  ability  to  give  explanations.  The  goal  of  our 
research  is  to  develop  and  evaluate  a  flexible  explanation  facility,  one  that  can  dynam- 
icallv  eenerate  responses  lo  questions  not  anticipated  by  the  system  s  designers  and 
that  can  tailor  these  responses  to  individual  users.  To  achieve  this  flexibility,  we  are 
developing  a  large  knowledge  base,  a  viewpoint  construction  facility,  and  a  modeling 


facility. 

In  the  long  term  we  plan  to  build  and  evaluate  advisory  systems  with  flexible 
explanation  facilities  for  scientists  in  numerous  domains.  In  the  short  term,  we  are 
focusing  on  a  single  complex  domain  in  biological  science,  and  wc  are  working  toward 
two  important  milestones:  1)  building  and  evaluating  an  advisory  system  with  a  flexible 
exp1a.ua,l,ion  facility  for  freshman-level  students  studying  biology,  and  2)  developing 
general  methods  and  tools  for  building  similar  explanation  facilities  in  other  domains. 


^Support  for  this  research  was  provided  by  the  Ait  Force  Office  of  Scientific  Research  (contract  number 
F49620-93- 1-0239). 
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1  Resecirch  Objectives 

The  goal  of  our  research,  is  to  develop  and  evaluate  a  fitxiblt  explanation  facility  that  can 
dynamically  generate  responses  to  questions  not  anticipated  by  the  system  s  designers  and 
that  can  tailor  these  responses  to  individual  users.  Previous  advisory  systems  have  lacked 
these  capabilities  for  a  variety  of  reasons.  In  this  section  we  will  describe  the  problems  of 
current  advisory  systems,  the  solutions  to  these  problems  that  we  propose,  and  our  research 
activities  for  achieving  those  solutions. 

Problems.  The  explanation  facilities  of  current  advisory  systems  are  inflexible  for  two 
reasons: 

•  Tnadequa-te  domain  knowledge:  At  least  two  factors  limit  the  adequacy  of  the  knowl¬ 
edge  base  as  a  source  of  “raw  materials"  for  flexibly  generating  explanations:  small 
size  and  task  specificity.  Although  small  size  is  an  obvious  limitation,  few  research 
projects  have  built  a  large-scale  knowledge  base  as  their  “starting  point  for  research 
on  explanation.  Phrthermore,  because  the  knowledge  for  most  advisory  systems  sup¬ 
ports  only  a  single  task,  most  research  on  explanation  has  overlooked  issues  outside 
the  task  requirements,  such  a.s  answering  a  range  of  questions,  explaining  terminology, 
and  customizing  explanations  for  specific  users  [12].  (For  notable  exceptions  see  work 
by  Moore  and  Swartout  [23,  13].) 

•  Inability  to  reorganize  knowledge:  Little  work  has  been  done  to  develop  methods  to 
select  coherent  packets  of  knowledge  from  a  knowledge  base,  and  even  less  on  the  reor¬ 
ganization  of  portions  of  the  knowledge  base  to  improve  specific  explanations.  These 
issues  have  been  avoided  by  “hardwiring”  knowledge  structures  t.ha.f,  a.re  suitable  for  the 
limited  explanations  required  by  a  particular  advisory  system.  (For  notable  exceptions 
see  work  by  McKeown  [11]  and  Sutliers  [22].) 

Solutions.  We  have  developed  a  five-part  solution  to  the  problems  of  current  advisory 
systems.  Our  solution  comprises:  (1)  constructing  a  knowledge  base  which  is  large-scale 
and  contains  very  fine-grained  representations,  (2)  selecting  and  organizing  knowledge  with 
viewpoints  and  models,  (3)  generating  new  viewpoints  on  demand;  (4)  generating  explana¬ 
tions  which  relate  new  information  to  what  the  user  already  knows,  and  (o)  constructing 
and  simulating  models  and  using  them  to  explain  the  behavior  of  mechanisms.  We  briefly 
describe  each  of  these  in  turn,  hut  we  focus  on  the  last  one. 
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First,  we  have  built  an  extensive  knowledge  base  for  one  area  of  biology  —  college-level 
anatomy  and  physiology  of  plants  [16].  Although  it  is  under  constant  development,  it  is 
alreadv  one  of  the  largest  knowledge  bases  in  existence.  (Our  knowledge  base  currently 
contains  about  3,000  frames  and  ovex  28,000  facts.)  Unlike  knowledge  bases  built  with 
instructional  frames  [8]  or  hypertext  [2],  our  knowledge  base  consists  of  “atomic  facts”  that 
our  expUiimtlon  facility  can  combine  in  different  ways  to  produce  different  explanations. 

Second,  we  have  developed  methods  for  selecting  information  from  the  knowledge  base 
and  organizing  it  into  a  coherent  bundle  appropriate  to  the  situation  at  hand.  One  organizing 
structure  is  that  of  vievjpoints,  which  provide  coherent  descriptions  of  objects  or  processes. 
For  instance,  the  viewpoint  “photosynthesis  as  a  product.ion  process”  selects  and  organizes 
facts  to  explain  how  photosynthesis  produces  glucose  from  carbon  dioxide  and  water.  An¬ 
other  organizing  structure  is  that  of  models,  which  are  built  from  viewpoints  and  support 
computer  simu]a.tion.  For  example,  an  energy  flow  model  of  the  plant  includes  the  viewpoints 
“photosynthesis  as  an  energy  transduction  process”  and  “respiration  35  an  energy  transfer 
process.”  and  it  allows  an  advisory  system  to  predict  and  explain  the  effects  of  changes  in 
light  wavelength  on  a  plant’s  photosynthetic  or  respiratory  rate  under  a  variety  of  specific 
circumstances. 

Third,  we  have  developed  methods  to  automatically  generate  new  viewpoints.  This 
ability  is  important  because,  as  system  designers,  we  cannot  anticipate  ail  the  viewpoints 
necessary  for  effective  explanations.  For  example.  Table  1  lists  several  viewpoints  on  photo¬ 
synthesis  and  the  situations  in  which  they  might  arise.  Our  question  aiiswei'ing  facility  is  able 
to  construct  the.se  viewpoints  by  selecting  and  reorganizing  the  individual  facts  comprising 
existing  viewpoints  in  the  knowledge  base  (see  [l]). 

Forth,  we  have  developed  methods  to  automatically  generate  integrative  explanations, 
which  explicitly  relate  new  information  to.  what  the  user  already  knows.  This  is  important 
to  advisory  systems  because  the  coherence  of  an  explanation  depends  upon  the  particular 
situation.  Our  sy.stcm  records  the  discourse  with  each  user  and  explains  new  topics  in  way.? 
that  relate  to  that  user’s  knowledge  and  interests  (see  [10]). 

Finally,  we  have  developed  methods  for  automatically  constructing  and  simulating  mod¬ 
els  and  interpreting  the  consequences  of  simulations.  These  methods  use  existing  methods  of 
qualitative  reasoning,  but  add  two  new  capabilities;  constructing  models  from  large  knowl¬ 
edge  bases  and  generating  explanations  from  these  models.  This  allows  our  explanation 
facility  to  answer  “what-if’  questions  that  were  unanticipated  when  the  knowledge  base  was 
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Viewpoint  on  i^hoto synthesis 

Contextual  Situation 

as  a  destructive  process 

To  expldn  the  effects  of  the  first  oxygen 
producing  plants  on  other  organisms  during 
evolution. 

as  an  essential  process  in  ecosys¬ 
tem  energy  flow 

To  explain  how  almost  all  living  things  de¬ 
pend  on  photosynthesis  for  deriving  energy 
from  an  abiotic  source. 

as  a  magnesium-utilizing  process 

To  explain  the  effects  of  magnesium  defi 
cicncy  on  the  plant. 

as  an  enabling  jirocess 

To  explain  how  photosynthesis  is  impor¬ 
tant  for  any  processes  which  use  glucose  or 
oxygen. 

as  a  constructive  process 

To  explain  how  photosynthesis  is  vitally  im¬ 
portant  to  plant  growth  and  reproduction. 

Table  1:  A  few  of  the  viewpoints  on  photosynthesis  and  the  teaching  situations  in  which 
they  might  be  appropriate. 

built  (see  [18]).  Developing  this  capability  has  been  our  primary  focus  during  the  two  years 
of  AFOSR  funding,  and  it  is  the  focus  of  the  remainder  of  thi.s  report. 

2  Automated  Modeling  of  Complex  Systems  to  An¬ 
swer  Prediction  Questions 

The  ability  to  answer  prediction  questions  is  crucial  in  reasoning  about  physical  systems. 
The  following  question,  from  the  plant  physiology  domain,  Illustrates  the  general  form  of 
a  prediction  question:  “How  would  decreasing  soil  moisture  affect  a  plant’s  transpiration^ 
rate?”  A  prediction  question  poses  a  hypothetical  scenario  (e.g.,  a  plant  whose  soil  moisture 
is  decrea^sing)  and  asks  for  the  resulting  behavior  of  specified  variables  of  interest  (e.g.,  the 
plant’s  transpiration  rate).  An  answer  to  a  prediction  question  includes  the  desired  predic¬ 
tions  and,  perhaps  more  importantly,  an  explanation  of  the  assumptions  and  principles  that 

■‘'Transpiration  is  the  process  by  which  water  evaporates  from  the  leaves. 
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justify  the  predictions,  in  biology  and  ecology,  such  questions  are  important  for  predicting 
the  consequences  of  natural  conditions  and  management  policies  as  well  as  for  Leaching  bio¬ 
logical  and  ecological  principles.  Because  prediction  is  time  consuming  and  error  prone,  and 
requires  people  with  special  knowledge,  automation  would  be  valuable. 

A  tool  for  answering  prediction  questions  would  be  particularly  useful  for  predicting  the 
effects  of  global  climate  changes  on  plants  and  animals  in  specific  regions.  Answering  these 
questions  requires  considerable  knowledge:  general  principles  of  plant  and  animal  physiolo,gy 
and  species  interactions  as  well  as  specific  data  on  individual  species,  climatic  events,  and 
geologic  formations.  The  central  issue  in  answering  prediction  questions  is  constructing, 
from  this  wealth  of  information,  a  model  that  captures  the  important  aspects  of  the  scenario 
and  their  relationships  to  the  variables  of  interest. 

This  section  describes  TRIPEL,  a  modeling  program  for  answering  prediction  questions. 
Section  3  defines  the  modeling  task.  Section  4  presents  TRiPEL’s  criteria  for  distinguishing 
relevant  aspects  of  the  scenario  from  irrelevant  aspects.  Section  5  describes  the  algorithm 
that  uses  these  criteria  to  constriict  the  simplest  adequate  model  for  answering  a  question. 

While  TRIPEL  is  designed  to  support  a  wide  variety  of  domains,  it  has  been  extensively 
tested  in  the  domain  of  plant  physiology.  Specifically,  TRIPEL  has  been  used  to  answer 
questions  from  the  Botany  Knowledge  Base  [17].  The  BKB  is  a  large  (over  200,000  facts), 
multipurpose  knowledge  base  covering  plant  anatomy,  physiology,  and  development.  It  was 
developed  by  a  domain  expert.  Section  6  discusses  the  results  of  evaluating  TRIPEL  using 
the  BKB. 

Because  the  BKB  covers  many  different  physical  phenomena  at  many  levels  of  detail, 
constructing  simple  yet  adequate  models  from  it  is  a  difficult  task.  The  techniques  that  allow 
TRIPEL  to  perform  this  task  efficiently  are  applicable  throughout  science  and  engineering, 
but  they  are  especially  useful  for  biology  and  ecology. 


3  The  Modeling  Task 

TRIPEL’s  inputs  are  a  prediction  question  and  domain  knowledge.  Tb.c  question  has  two 
parts:  the  scenario  and  the  variables  of  interest.  The  scenario  includes  physical  objects,  spa- 
tial  relations  among  them,  and  driving  conditions.  Driving  conditions  specify  the  behavior  of 
selected  variables  soil  moisture  is  decreasing),  t.hcir  initial  value  (e.g.,  the  temperature 

is  above  the  freezing  point),  or  both. 
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TRIPEL  uses  the  compositional  modeling  approach  [3],  in  which  the  modeler's  job  is  to 
select  those  elements  of  domain  knowledge  that  are  needed  to  answer  the  question.  Our  re¬ 
search  focuses  on  building  differential  equation  models,  so  the  elements  of  domain  knowledge 
are  the  influences  that  pertain  to  the  scenario. 

An  influence  is  a  causal  relation  between  two  variables,  as  in  Quahtative  Process  Theory 
[5].  The  vaxiables  arc  real-valued,  time-varying  properties  of  the  scenario  {e.g.,  soil  moisture 
or  the  plant’s  transpiration  rate).  Each  influence  specifies  that  a  variable,  or  its  rate  of 
change,  is  a  fuiiclion  of  another  variable. 

Conceptually,  each  influence  represents  a  physical  phenomenon  in  the  scenario  at  some 
level  of  detail.  Typically,  an  influence  represents  the  effect  of  a  process  (e.g.,  the  amount  of 
water  in  the  plant  is  negatively  influenced  by  the  rate  of  transpiration)  or  a  factor  that  affects 
a  process’s  rate  (e.g.,  the  rate  at  which  the  plant  absorbs  water  from  the  soil  is  positively 
influenced  by  the  level  of  soil  moisture).  To  emphasize  their  role  in  modeling,  we  call  the  set 
of  all  influences  that  pertain  to  the  scenario  the  candidate  influences. 

TRIPBL’s  output,  the  scenario  models  is  the  subset  of  candidate  influences  that  axe  rele¬ 
vant  to  the  question.  Another  program,  the  Qualitative  Process  Compiler  [4],  built  on  QSIM 
[9],  simulates  the  .scenario  model  starting  from  the  initial  state  of  the  scenario.  This  simu¬ 
lation  generates  the  predictions  that  are  needed  to  answer  the  question.  A  colleague  at  the 
University  of  Texas  is  developing  a  program  that  will  use  the  model  and  simulation  results 
to  answer  the  question  and  explain  the  an.swer. 


4  Modeling  Criteria 

When  the  dommn  knowledge  is  extensive,  as  with  plant  physiology,  it  will  de.scribe  many 
phenomena  in  the  scenario,  some  at  multiple  levels  of  detail.  Thus,  there  arc  two  tundamentai 
issues  in  modeling.  First,  the  modeler  must  decide  which  phenomena  ai'e  relevant  to  the 
question  and  which  can  be  ignored.  Second,  for  each  relevant  phenomenon,  the  modeler 
must  choose  a-  relevant  level  of  detail.  .A  candidate  influence  is  relevant  if  it  represents  a 
relevant  level  of  detail  for  a  relevant  phenomenon. 
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4.1  Scope 

Of  the  many  phenomena  in  any  scenario,  only  a  few  are  needed  to  answer  any  particular 
question.  For  example,  of  the  many  processes  at  work  in  a  plant,  the  question  about  de¬ 
creasing  soil  moisture  only  requires  a  model  of  the  plant's  water  regulation  processes.  The 
scope  of  a  model  is  the  set  of  phenomena  it  covers. 

There  are -two  types  of  irrelevant  phenomena.  The  first  type,  insignificant  phenomena, 
can  be  ignored  because  they  do  not  significantly  influence  the  variables  of  interest.  Tor 
instance,  in  our  example,  growth  processes  can  be  ignored  because  they  do  not  significantly 
influence  the  transpiration  rate. 

The  second  type  of  irrelevant  phenomena  are  those  that  can  be  treated  as  exogenous.  For 
instance,  in  our  example,  the  processes  that  regulate  soil  moisture  (e.g.,  rain  and  evaporation 
from  the  soil)  can  be  treated  as  exogenous.  Although  exogenous  phenomena  do  significantly 
influence  the  variables  of  interest,  they  are  nonetheless  irrelevant  to  the  question;  they  do  not 
help  predict  the  effects  of  the  driving  conditions  (in  our  example,  decreasing  soil  moisture) 
on  the  variables  of  intere.st. 

To  choose  a  suitable  scope  for  the  model,  the  modeler  must  eliminate  both  types  of 
irrelevaiili  plicnoincna.  To  eliminate  insignificant  phenomena,  the  modeler  needs  criteria 
for  recogniaiiig  insignificant  influences.  By  pruning  insignificant  influences,  the  modeler 
disconnects  the  model  from  all  the  insignificant  phenomena  in  the  scenario. 

TRIPEL  determines  whether  an  influence  is  significant  using  time  scale  information.  Pro¬ 
cesses  cause  significant  change  on  widely  disparate  time  sca.)e.s-  For  exa.mplc,  in  a  plant, 
water  flows  through  membranes  on  a  time  scale  of  seconds,  solutes  flow  through  membranes 
on  a  time  .scale  of  minutes,  and  growth  requires  hours  or  days.  In  TRIPEL,  each  influence 
that  represents  an  effect  of  a.  process  may  have  associated  knowledge  specifying  the  fastest 
time  scale  on  which  the  effect  is  signifleant.  Before  constructing  the  scenario  model,  TRIPEL 
automatically  determines  a  suitable  tiw-e  scale,  of  interest  for  the  question  [20].  The  time 
scale  of  Interest  allow’s  TRIPEL  to  conclude  that  any  enudidate  influence  operating  on  a  slower 
time  scale  is  insignificant.  This  significance  criterion  is  used  by  human  modelers  in  many 
domains,  including  biology,  ecology,  and  many  branches  of  engineering  [6,  15,  21]. 

To  eliminate  exogenous  phenomena,  the  modeler  needs  criteria  for  choosing  the  exogenous 
variables  of  the  model.  Exogenous  variables  are  those  variables  in  the  model  whose  behavior 
is  determined  by  influences  that  are  outside  the  scope  of  the  model.  .411  other  variables  in 
the  model  are  dependent,  their  behavior  is  determined  by  influences  in  the  model.  Thus, 
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the  exogenous  variables  constitute  the  boundary  of  the  model,  separating  the  model  from 
exogenous  phenoiiieua  in  the  scenario.  For  instance,  in  our  example,  by  treating  soil  moisture 
as  an  exogenous  variable,  the  processes  that  regulate  soil  moisture  are  excluded  from  the 

model. 

lb  determine  whether  a  variable  in  the  model  can  be  treated  as  exogenous,  TRIPEL  uses 
two  criteria.  First,  by  definition,  the  variable  must  not  be  significantly  influenced,  in  the 
scenario,  by  any  other  variable  in  the  model.  One  variable  significantly  influences  another 
if  there  is  a  chain  of  candidate  influences  leading  from  the  first  variable  to  the  second  and 
every  influence  in  the  chain  is  significant.  Second,  note  that  the  objective  m  a  predictmn 
question  is  to  predict  the  effects  of  the  driving  variables  on  the  variables  of  interest.  A  driving 
variable  is  one  whose  behavior  or  initial  state  is  specified  in  the  question  (in  our  example,  soil 
moisture).  To  meet  that  objective,  the  modeler  must  ensure  that  the  exogenous  variables 
do  not  separate  the  mode)  from  the  driving  variables  of  the  question.  Therefore,  a  variable 
in  a  model  can  be  treated  as  exogenous  only  if  it  is  not  significantly  influenced,  in  the 
scenario,  by  any  driving  variable  of  the  question.  TRIPEL  tests  these  two  criteria  using  a 
gjaph  connectivity  algorithm  on  the  candidate  influences  [20]. 

In  summary,  TRIPEL  eliminates  irrelevant  phenomena  from  the  scope  of  the  model  by 
pruning  insignificant  influences  (using  time  scale  infornialioii)  and  by  choosing  suitable  ex 
ogenous  variables  for  the  model.  Phenomena  that  do  not  significantly  influence  the  variables 
of  interest,  or  that  influence  the  variables  of  interest  only  through  exogenous  variables,  are 
not  included  in  the  model  (at  any  level  of  detail). 


4.2  Level  of  Detail 

The  domain  knowledge  may  provide  multiple  levels  of  detail  for  many  phenomena  in  the 
scenario.  For  example,  water  in  the  plant  can  be  treated  as  an  aggregate,  or  the  water 
in  the  roots,  stem  and  leaves  can  be  modeled  individually.  Similarly,  processes  can  be 
aggregated .  For  example,  the  chemical  formula  for  photosynthesis  summarizes  the  net  effects 
of  its  component  reactions.  Also,  the  dynamics  of  a  process  can  often  be  summarized  by 
its  equilibrium  results.  For  example,  when  the  level  of  solutes  in  a  plant  cell  changes,  the 
process  of  osmosis  adjusts  the  cell’s  water  to  a  new  equilibrium  level.  If  the  dynamics  of  this 
process  are  irrelevaiiL,  the  modeler  can  simply  treat  the  level  of  water  as  an  instantaneous 
function  of  the  level  of  solutes.  Each  of  the.se  types  of  alternatives  arises  in  many  areas  of 
science  and  engineering. 
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For  each  relevant  phenomenon  in  the  scenario,  the  modeler  must  choose  a  suitable  level 
of  detail.  Irrelevant  details  complicate  simulation  and  malrc  the  resulting  explanation  less 
comprehensible,  so  the  modeler  mu.5t  choose  the  simplest  level  of  detail  that  is  adequate  for 
answering  the  question. 

TRIPEL  has  several  criteria  for  choosing  the  level  of  detail.  First,  some  approximations 
may  be  invalid  in  the  context  of  the  question.  For  example,  process  dynamics  cau  only 
be  summarized  by  their  equilibrium  result  if  the  process  reaches  equilibrium  very  quickly 
relative  to  the  time  scale  of  interest.  TRIPEL  includes  a  variety  of  general  principles  for 
recognizing  that  a  level  of  detail  is  invalid  or  in3.dcquate  for  a  question. 

Second,  TRIPEL  includes  coherence  criteria.  These  ensure  that  the  level  of  detail  chosen 
for  different  phenomena  in  the  model  are  compatible.  The  coherence  criteria  also  ensure 
that  the  model  does  not  include  different  levels  of  detail  for  any  single  phenomenon. 

Finally,  for  those  alternatives  that  are  adequate  for  the  question  and  coherent  with  other 
parts  of  the  model,  TRIPEL  chooses  the  one  that  leads  to  the  simplest  adequate  model,  While 
any  simplicity  criteria  could  be  used,  TRIPEL  defines  one  model  as  simpler  than  another  if  it 
has  fewer  variables.  The  number  of  variables  in  a  model  is  a  good  heuristic  measure  of  tlie 
complexity  of  simulation  and  of  the  modePs  comprehensibility. 

In  summary,  the  domain  knowledge  often  provides  alternative  levels  of  detail  for  relevant 
phenomena,  and  the  modeler  must  determine  which  level  is  relevant.  In  TRIPEL,  a  level  of 
detail  is  relevant  if  it  is  adequate  for  answering  the  tpiestion,  coherent  with  other  elements 
of  the  model,  and  it  leads  to  the  simplest  adequate  model. 


5  Modeling  Algorithm 

Each  candidate  influence  represents  some  phenomenon  at  some  level  of  detail,  so  TRIPEL  s 
criteria  for  choosing  scope  and  level  of  detail  allow  it  to  determine  the  influences  that  should 
be  included  in  the  scenario  model.  This  section  explains  TRlPEL’s  algorithm  for  selecting 
the  relevant  influences. 

TRIPEL  conducts  a  best-first  search  for  the  simplest  adequate  scenario  model  for  the 
question.  Ea,ch  state  in  the  search  space  is  a  partial  model,  a  mode)  whose  scope  may  not 
include  all  relevant  phenomena.  A  partial  model  may  contain  free  variables  (variables  not 
vet  chosen  as  exogenous  or  dependent).  The  initial  slate  in  the  search  is  a  partial  model 
consisting  only  of  the  variables  of  interest,  all  free.  The  successor  function,  described  below, 
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extends  the  scope  of  a  partial  model  to  include  any  additional  phenomena  relevant  to  the 
free  variables,  possibly  adding  new  free  variables.  A  partial  model  has  iiiulliple  successors 
when  there  are  alternative  levels  of  detail  for  the  new  phenomena.  A  partial  model  is 
pruned  from  the  search  if  it  is  incoherent  (i.e.,  violates  the  coherence  criteria)  or  invalid  (i.e., 
includes  an  invalid  level  of  detail);  any  extension  of  an  incoherent/invalid  partial  model  is 
also  iiicoherent/in'^lid.  The  search  ends  when  an  adequate  model  is  found  that  is  at  least  as 
simple  as  all  remaining  partial  models;  these  partial  models  can  only  grow.  The  simplicity 
criterion  (i.e...  number  of  va.ria.blfts  in  the  model)  also  serves  as  the  evaluation  iunction  lor 
the  best-first  search. 

The  successor  function,  extend-model,  extends  the  scope  of  a  partial  model.  Extend- 
model  first  determines  whether  all  the  free  variables  in  the  partial  model  can  be  exogenous, 
as  described  in  Section  4.1.  If  so,  it  marks  each  one  as  exogenous  and  returns  the  resulting 
model,  which  is  now  complete.  Otherwise,  it  chooses  a  free  variable  that  must,  be  dependent, 
and  it  determines  all  combinations  of  candidate  influences  on  that  variable  that  include  every 
significant  influencing  phenomenon  at  some  level  of  detail  (multiple  combinations  arise  from 
alternative  levels  of  detail  for  these  phenomena).  Extend-model  returns  a  set  of  new  partial 
models,  each  the  result  of  extending  the  original  partial  model  with  one  of  the  combinations. 

To  extend  the  original  partial  model  with  a  combination  of  candidate  influences,  extend- 
model  adds  the  influences  to  the  model,  marks  the  chosen  free  variable  as  dependent,  and  adds 
any  new  free  va-riablcs  to  the  model.  The  new  free  variables  are  those  variables  referenced 
by  the  new  influences  but  not  already  in  the  model  (e.g.,  an  influencing  variable). 

This  algorithm  is  guaranteed  to  return  the  simplest  adequate  scenario  model  whenever 
an  adequate  scenario  model  exists.  To  see  this,  note  that  each  partial  model  represents  all 
its  extensions.  Thus,  the  initial  partial  model  in  the  search  represents  all  scenario  models 
that  Include  the  variables  of  interest.  Conceptually,  the  guarantee  results  from  the  following 
straLeg%': 

•  From  the  space  of  all  possible  scenario  models,  the  algorithm  repeatedly  prunes  away 
models  until  only  a  single  scenario  model  (if  any)  remains. 

•  It  never  prunes  a  scenario  model  unless  either  (1)  the  model  is  inadequate  for  the 
question  or  (2)  if  the  model  is  adequate,  there  is  a.n  adequate  scenario  model  still 
under  consideration  (i.e.,  that  is  an  extension  of  a  partial  model  on  the  search  agenda) 
that  is  at  least  as  simple. 
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For  the  details  of  the  proof,  see  [19]. 

6  Evaluation 

To  evaluate  TRirEL,  wc  tested  it  on  a  variety  of  prediction  questions  concerning  the  phys¬ 
iology  of  a  prototypical  plant.  The  questions  were  generated  by  a  domain  expert.  Each 
question  specifies  the  qualitative  behavior  of  one  variable  (e.g.,  soil  moisture  is  decreasing) 
and  asks  for  ihe  resulting  behavior  of  another  (e.g.,  transpiration  rate). 

The  domain  knowledge  was  provided  by  the  Botany  Knowledge  Base  (BKB)  [17].  The 
DKB  is  a  large  (over  200,000  facts),  multipurpose  knowledge  base  covering  plant  anatomy, 
physiology,  and  development.  It  was  developed  by  a  domain  expert.  The  BKB  provides  691 
variables  representing  properties  of  a-  plant  and  its  environment  (soil  and  atmosphere),  and 
it  provides  1507  candidate  influences  among  them.  The  candidate  influences  cover  many 
different  types  of  processes,  including  water  regulation,  metabolism,  temperature  regulation, 
and  transportation  of  gasses  and  solutes.  These  processes  operate  on  many  different  time 
scales.  Many  phenomena  covered  by  the  BKB  are  represented  at  multiple  levels  of  detail,  as 
described  in  Section  4.2. 

The  evaluation,  described  in  detail  in  [19],  suggests  that  TRIPEL  is  already  an  effective 
modeling  program.  Despite  the  size  of  the  BKB,  TRIPEL  typically  generates  simple,  adequate 
models,  as  judged  by  a  domain  expert.  Models  ranged  in  size  from  3  variables  to  93  variables, 
and  more  than  half  had  fewer  than  20  variables.  Furthennore,  the  knowledge  TRIPEL  requires 
to  construct  these  models  is  fundamental  plant  physiology  knowledge  that  is  natural  for  a 
domain  e.xpert  to  etreode. 

The  evaluation  also  identified  the  most  important  limitation  of  TRIPEL:  its  criterion  for 
determining  whether  one  variable  significantly  influences  another  should  be  more  sophisti¬ 
cated.  Currently,  TRIPEL  concludes  that  one  variable  significantly  influences  another  if  there 
is  a  chain  of  influences  connecting  them  and  every  influence  in  the  chain  is  significant  on  the 
time  scale  of  interest.  The  evaluation  suggests  that  TRIPEL  should  also  consider  extra  time 
lags  due  to  the  length  of  the  ehaiu  or  the  spatial  distance  it  covers.  Due  to  this  limitation, 
TRIPEL  sometimes  chooses  a  time  scale  of  interest  that  is  too  fast,  and  it  sometimes  includes 
irrelevant  elements  in  models.  TRIPEL  is  designed  to  easily  incorporate  additional  criteria 
for  determining  the  significance  of  influences  and  chains  of  influences,  so  the  main  challenge 
for  future  reseatreh  is  simply  to  formulate  the  criteria. 
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7  Related  Work 

Thft  moHeling  programs  of  Falkenhainer  and  Forbus  [3],  Nayak  [14],  and  Iwasaki  and  Levy 
[7]  are  most  similar  to  I  RIFKL.  The  program  of  Falkenhainer  and  Forbus  is  notable  for  iLs 
contrasting  method  of  selecting  the  scope,  and  Nayak’s  program  is  notable  for  its  contrast¬ 
ing  method  of  constnicting  a  model  (it  builds  an  overly  complex  model  and  then  repeatedly 
simplifies  it).  The  modeling  algorithm  developed  by  Iwasaki  and  Levy  is  most  similar  to 
TRIPEL’s  algorithm,  although  their  algorithm  cannot  automatically  choose  exogenous  vari¬ 
ables.  For  a  detailed  comparison  between  TRIPEI-  and  these  programs,  see  [20]  and  [19]. 

8  Conclusions 

The  primary  results  of  our  research  are.  three-fold.  First,  we  developed  general  methods  for 
building  intelligent  tutoring  systems  that  teach  prediction.  Second,  we  built  a  substantial 
tutoring  system  for  the  task  of  prediction  and  experimentally  evaluated  it.  Third,  we  built 
an  extensive  knowledge  base  for  college-level  biology  and  developed  prototype  software  for 
answering  questions  with  coherent  explanations.  From  this  experience,  we  have  learned  how 
to  structure  large  knowledge  ba,se$  using  viewpoints,  and  we  have  created  a  foundation  on 
which  to  build  tutoring  system  for  a  wide  variety  of  prediction  tasks. 
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